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The Laser Interferometer Space Antenna (LISA) defines new demands on data analysis efforts in 
its all-sky gravitational wave survey, recording simultaneously thousands of galactic compact object 
binary foreground sources and tens to hundreds of background sources like binary black hole mergers 
and extreme mass ratio inspirals. We approach this problem with an adaptive and fully automatic 
Reversible Jump Markov Chain Monte Carlo sampler, able to sample from the joint posterior density 
function (as established by Bayes theorem) for a given mixture of signals "out of the box", handling 
the total number of signals as an additional unknown parameter beside the unknown parameters 
of each individual source and the noise floor. We show in examples from the LISA Mock Data 
Challenge implementing the full response of LISA in its TDI description that this sampler is able to 
extract monochromatic Double White Dwarf signals out of colored instrumental noise and additional 
foreground and background noise successfully in a global fitting pproach. We introduce 2 examples 
with fixed number of signals (MCMC sampling), and 1 example with unknown number of signals 
(RJ-MCMC), the latter further promoting the idea behind an experimental adaptation of the model 
indicator proposal densities in the main sampling stage. We note that the experienced runtimes 
and degeneracies in parameter extraction limit the shown examples to the extraction of a low but 
realistic number of signals. 



I. MOTIVATION 

One distinguishing feature of the field of astronomy in 
comparison to other branches of physics is the inability 
of astronomers to perform repetitive experiments when 
comparing models or hypotheses. Instead, all compet- 
ing hypotheses must be compared in light of the avail- 
able observational data, which has encouraged the use 
of statistical inference when weighing hypotheses or es- 
timating physical parameters. In recent years Bayesian 
inference has played an important part in this process, 
with its ability to update the relative probabilities of rel- 
evant models given individual observations which are not 
repeatable. 

Our work is motivated by the analysis challenges posed 
by the Laser Interferometer Space Antenna (LISA), 
an ES A/NASA laser interferometric mission to map 
the gravitational wave sky in the frequency range ~ 
10" 4 Hz - 1Hz with a launch window of 2018+. LISA 
will observe a multitude of sources with a range of sig- 
nal to noise ratios, from the thousands in the case of 
massive-black hole coalescence down to order unity and 
below from a variety of other sources. It will also likely 
observe tens of thousands of individual sources, the vast 
majority being galactic stellar mass binaries so abundant 
that in the low frequency portion of the spectrum ( ^ 3 
mHz) white dwarf binary systems effectively produce a 
stochastic foreground of gravitational waves. LISA data 
analysis therefore comprises a global analysis problem, 
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which is interesting in itself as a new analysis problem, 
and of vital importance for the full science exploitation 
of this new mission. The total number of sources is un- 
known, the number of parameters that need to be esti- 
mated is very large and the instrument noise is not read- 
ily estimated from the data stream. This is an analy- 
sis problem for which Bayesian inference with a known 
model on the dataset could well provide a very powerful 
tool. One of the key issues is that the variety of possible 
sources is very considerable - white-dwarf binaries, mas- 
sive black holes, extreme-mass ratio inspirals and bursts 
from cosmic strings all produce distinct signals - and it is 
therefore appealing to develop an analysis method which 
is as robust as possible, in the sense that it does not re- 
quire specific tuning for the signal template on which it 
is applied. 

Although the theoretical set-up is straightforward, 
the practical implementation of Bayesian inference can 
be challenging due to the difficulty in computing 
multi-dimensional integrals on large dimensional spaces. 
Markov chain Monte Carlo (MCMC) techniques for a 
given fixed number of total signals and their exten- 
sion Reversible Jump Markov chain Monte Carlo (RJ- 
MCMC) methods for an unknown total number of signals 
have become very popular to manage these integrations, 
as they effectively address this issue by sampling from 
a distribution proportional to the joint posterior density 
that one wishes to evaluate. One of the key difficulties 
with (RJ)MCMC methods is seen in probing the param- 
eter space in a way which is both thorough (one does not 
want to miss important regions of the space that con- 
tribute to the posterior density) and efficient (one wants 
to ensure that the algorithm converges quickly to a solu- 
tion). Former work by Umstatter et al. (pQ) showed the 
ability of RJ-MCMC approaches to untangle the stochas- 
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tic foreground, in research directly targeted to demon- 
strate the ability of RJ-MCMC methods to probe the 
parameter space thoroughly even in case of an unknown 
total number of signals. However, this research neglected 
the LISA response function and its computational cost, 
and thus amplitude and phase modulations induced by 
the position and orientation of the source in the sky. Cor- 
nish and Crowder (2) showed the ability to quickly and 
reliably estimate parameters from individually resolvable 
binary systems, but in the case where the total number 
of signals is known. The problem of untangling the DWD 
stochastic foreground was subsequently tackled in Crow- 
der and Cornish (3), an MCMC approach that tries to 
extract the loudest signals in an iterative detect and ex- 
tract/subtract approach. We note that though this ap- 
proach performs efficiently, the method itself is not op- 
timal as each time one subtracts a source from the data 
one introduces errors from a) numerical or floating point 
errors and b) from the signal trace itself if the recov- 
ered signal does not match entirely the injected signal. 
Furthermore, the burden gets larger and larger the more 
signals one extracts. Trias et al. (4) explores another 
MCMC approach with fixed numbers of signals, which 
used a delayed rejection MCMC scheme to efficiently ex- 
plore the multimodal structure of the likelihood function. 

In contrast to the approach outlined here, these ap- 
proaches all required some manual tuning of the param- 
eters, whereas the adaptive nature of the RJ-MCMC al- 
gorithm we present makes this unnecessary. 

Other applications of MCMC methods to gravitational 
wave data, van der Sluys et al. (|5j) tested adaptive MCMC 
methods in case of LIGO data and injected spinning black 
hole binaries. Spinning black hole binaries impose a prob- 
lem to MCMC data analysis as the parameter space is 
large and correlated. The adaptation builds on top of 
methods described in this paper, but add more classi- 
cal approaches (manually finetuned) like parallel temper- 
ing or simulated annealing in order to perform. Rover 
et al. (j6]) introduced a coherent analysis pipeline for the 
LIGO /VIRGO network in case of non-spinning black hole 
binaries, which did not use any sort of adaptation and is 
thus dependent on manually finetuning efforts. Other 
template-based methods have also been applied to the 
DWD source confusion problem, see e.g. Whelan et al. 
0, in which a grid is laid out in parameter space in 
the hope to catch the loudest signals if they are near the 
grid knots. Besides the disadvantage in the construction 
and performance on a grid of this approach, it was also 
found to result in a low detection efficiency. In the given 
example reference more than roughly two-third of all 
the injected waveforms, even the loudest waveforms with 
SNR>40, were missed as the grid search did not adapt 
to the data set under consideration. Another promising 
probabilistic template-based approach is found in Feroz 
et al , F. and Gair, J. R. and Hobson, M. P. and Porter, 
E. K. dHJ), in which the authors apply a multimodal nested 
sampling algorithm which is designed to efficiently eval- 
uate the Bayesian evidence and return posterior proba- 



bility densities for likelihood surfaces containing multiple 
secondary modes, applicable but not aplied to the LISA 
DWD source confusion problem. The authors demon- 
strate the application to two non-spinning supermassive 
black hole binary signals. The algorithm was found to 
rapidly identify all the modes of the solution and recover 
the true parameters of the sources to high precision. 

In this paper we present a new method to carry out 
Bayesian inference using an automatic reversible jump 
Markov chain Monte Carlo (RJ-MCMC) method: auto- 
matic means that the RJ-MCMC sampler is not tuned 
to work on a specific signal template, but works out- 
of-the-box on any signal template or set of signal tem- 
plates and posterior probability density functions. This 
implementation is built from the Auto-Mix sampler orig- 
inally developed by Hastie (9 ) in the context of statistical 
medicine. Our implementation is totally generic: it can 
be applied to a model of arbitrary dimensionality, it is 
trans- dimensional - that is the total number of parame- 
ters of the model is one of the unknowns that needs to be 
determined - and it assumes that the noise affecting the 
observations is unknown. This technique has been suc- 
cessfully applied, in full or in part, to a number of (simpli- 
fied) problems in a wide range of data analysis contexts: 
observations of kludge waveforms for massive black holes 
and extreme-mass ratio inspirals in LISA ([TO)) , spinning- 
binary systems in a network of ground-based laser in- 
terferometers (5), and single source white dwarf binary 
data sets in the first round of the Mock LISA Data Chal- 
lenges (MLDC) ([TT]) . In this paper we provide for the 
first time a detailed description of the method and dis- 
cuss its efficiency. We do so by applying it to the data 
sets distributed by the MLDC Task Force in the context 
of the MLDC round 1 and 2 as restricted to White Dwarf 
binary systems. For this specific LISA application we si- 
multaneously estimate the noise level along with the pa- 
rameters of the model. This paper focuses on the case in 
which the total number of sources contained in the data 
set is unknown, but will also consider specific examples 
with known total number of signals to verify individual 
parts of the full RJ-MCMC application. 

The structure of the paper is as follows: in Sec. [TT] we 
briefly review the model of gravitational wave signals and 
LISA output that is used in this paper to demonstrate 



our algorithm; Sec. Ill contains the description of the 
new algorithm and its present implementation; in Sec. [V] 
we present the results of the application of this method 
to selected data sets and white-dwarf binary sources; in 
Sec[VT]we provide a critical assessment of the status of the 
work in the context of LISA data analysis and pointers 
to future work. 



II. SIGNAL MODEL 

We follow the conventions adopted by the Mock LISA 
Data Challenge (MLDC) Task Force in the description of 
GW sources and Time Domain Interferometry (TDI) ob- 
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servable (fT2) [T3| fTi) [T5)) ; however, the analysis approach 
that we present here is completely general and can be 
applied to any waveform and TDI read-out. 

Throughout this paper we will concentrate on GWs 
generated by Double White Dwarf binary systems 
(DWD) whose frequency is assumed to be monochro- 
matic in the source reference frame. The two indepen- 
dent gravitational waves (as differentiated by their po- 
larization states + and x) are then described by a 7- 
dimensional parameter vector A 

\ = {A,l,1>,0,<I>Jo,M , (1) 

displaying the overall amplitude A, four parameters that 
describe the binary system's geometry - the position in 
the sky identified by the polar latitude and polar longi- 
tude (/>, and the (fixed) orientation of the orbital angular 
momentum vector that we parametrize by the inclina- 
tion angle t and the polarization angle ip - the initial fre- 
quency fo in the source reference frame and an arbitrary 
constant phase <po at some reference time. We intend to 
infer these signal parameters, and a parametrisation of 
the noise floor, by means of a goodness of fit measure, 
described later in this paper. 

In the source reference frame the two polarization 
states of the GWs read 

hl(t) = A(l + cos 2 i) cos(27r/ t + O ) , (2) 
h s x {t) = -2Acos^sin(27r/ o ^ + 0o). (3) 

From Eqs. ^ and ([3| one can derive the waveforms in 
the Solar System Barvcenter ([32)) : 

h(t- A) = h+(t) + ih x (t) = e~ 2i ^ [hl(t) + ih s x (t)] . (4) 

LISA does not directly observe the individual polariza- 
tions h+ and h x (Eq. [2] and Eq. [3| from Double White 
Dwarfs, but linear combinations of them, encoded in the 
one-way Doppler links, from which one synthesizes the 
so-called Time Delay Interferometry (TDI) observable. 
In this paper we follow the convention of the LISA de- 
scription adopted for the MLDCs, and we therefore work 
with TDI variables of generation 1.5. The interested 
reader can find a review of the technique of Time De- 
lay Interferometry in Tinto and Dhurandhar (16), and 
references therein. A conversion table between different 
TDI conventions is available at M. Vallisneri ([T7)h and 
first, modified (i.e. 1.5) and second generation TDI vari- 
ables were introduced in Armstrong et al. (18), Cornish 
and Hellings (19) and Tinto et al. ([20]); Shaddock (|2T|) . 
respectively. 

The raw data from which our analysis starts are the 
TDI-1.5 Michelson observable X, Y and Z. They are 
free from the (otherwise overwhelming) contribution from 
laser noise fluctuations, but the noise is still correlated. 
However, one can construct a new set of TDI variables 
that are orthogonal. In order to do so, we follow the 
standard procedure outlined in Prince et al. (22) and 
diagonalise the noise covariance matrix of X, Y and Z 



in order to obtain the uncorrelated TDI outputs pseudo 
A', E' and T'(33); their expression in terms of the TDI 
Michelson outputs is: 

A' = l -(2X-Y-Z), (5) 
E> = -±(Z-Y), (6) 

T' = -^(X + Y + Z). (7) 

In the applications of our analysis approach we will con- 
centrate on signals at frequencies smaller than w 3 mHz. 
In fact, in this regime the wavelength of GWs is longer 
than the LISA arm-length and one can introduce ap- 
proximations to the LISA response (the so-called long- 
wavelength approximation), while preserving the fidelity 
of the signal recorded at the detector. In our numerical 
implementation we follow the strategy implemented in 
the LISA Simulator ([23)) and particular in Crowder and 
Cornish ([3]) in which h+ and h x as wrapped within the 
TDI stream are modelled directly in the Fourier domain. 
The choice of working in the long-wavelength approxima- 
tion is entirely driven by computational reasons: it has no 
impact on the generality of our approach, but allows us 
to perform the analysis much faster and therefore explore 
a larger number of cases. In the low-frequency regime, T' 
is essentially insensitive to GWs ([24)) , and therefore only 
A' and E' carry astrophysical information. Those are the 
TDI variables on which we will perform the analysis. 

In summary, the data set that we are considering can 
formally be described as: 

K 

daU) = Ml) + V*(/; **) > a = A 'i E ' ; ( 8 ) 

k=l 

where d a represent the data of the a— th TDI output 
and n a the relevant noise contribution; h a ^ is the GW 
signal at the a-th TDI output produced by the k— th 
DWD characterized by the unknown parameter vector 
A/e, see Eq. ([!]), with an unknown total number K of 
DWDs in the data (k = l,...,if). The instrumental 
noise is modelled as a Gaussian and stationary random 
process with mean and variance given by: 

R(/H(/')> = \T5 ab S(f-f) a,b = A',E'(9) 
(n a (f)) = 0; (10) 

in Eq. ([9| S(f) is the one-sided noise spectral density of 
the TDI variables A! and E' \ which is identical, and T the 
duration of the observation (this should not be confused 
with the TDI variables T' and/or T, which we will not 
need to consider in the rest of paper). In the following 
we will use the notation {d a } to indicate the joint data 
set A' and E' . 

The power of the monochromatic signal at the LISA 
output is spread over several frequency bins, due to 
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LISA's motion during the typical observation lasting 
mont hs-t o-year s. LISA's change of orientation causes a 
frequency shift A Q / = 2/T « 6.5 x 1(T 8 Hz; LISA's mo- 
tion around the Sun with velocity v§ ~ 30 kms -1 pro- 
duces a Doppler modulation with typical width A^f w 
(v©/c)/o « 10 -7 (/o/l mHz) Hz. Both these frequency 
shifts are much smaller than the range over which S(f) 
varies; we will therefore assume the noise to be constant 
over the relevant restricted frequency band of the white 
dwarf, so that S(f) « const = So- In the analysis pre- 
sented here we will assume So to be unknown. The pa- 
rameter vector that we need to determine is therefore: 



K(K) = {{X k ,k = l,...,K},S ), 



(11) 



with K usually referred to as the model indica- 
tor/selector] and we will indicate with M = 7 x K + 1 
the number of dimensions of this vector. 



III. BAYES' THEOREM 

Bayes' theorem follows directly from the product rule 
in probability theory and provides a rigorous mathemati- 
cal prescription to assign the probability density function 
(PDF) to a model m given a data set d within some world 
view W. which represents all relevant prior information 
(e.g. Bretthorst (|25)Q : 



p(m\d, W) 



p(m\W) C(d\m,W) 
p(d\W) 



(12) 



In the previous expression, p(m\d, W) is the posterior 
probability density function of the model given the data; 
C(d\m,W) is the likelihood function of the data given 
the model, which quantifies how the degree of belief in 
the model is affected by the observations; p(m\W) is the 
prior probability of the model, which quantifies our state 
of belief prior to (new) observations, and p(d\W) is the 
evidence (or marginal likelihood), the probability of the 
data given only the background information. 

For the problem at hand the model is represented by 
the unknown parameter vector A and the data set is given 
by {^o}- In the following we will drop, to simplify no- 
tation the explicit reference to the background informa- 
tion W. By applying Bayes' theorem (12) to this specific 



problem, we aim at computing the joint posterior den- 
sity function (PDF) p(A\{d a }) of A given the data sets 
dA and d^, which is given by 



P(A|R}) = 



p(l)C{{d a }\K) 
P({d a }) 



(13) 



Due to the fact that the TDI observables A and E are 
independent, the likelihood function is simply: 



C({d a }\A) = l[C(d a \A) 



(14) 



One important feature of the LISA data set - both 
at the conceptual and practical level - is that the num- 
ber of signals present in any given data stretch is not 
known a priori. As a consequence, the dimensionality 
M, see Eq. ( [ll| , of the parameter vector A is itself one 
of the unknowns in the analysis. Such problems are usu- 
ally called, in the Bayesian literature, trans- dimensional; 
the automatic approach that we provide here to tackle 
such a scenario represents the main novelty of the paper. 
We will start by considering the 'standard' scenario in 
which M is fixed and known; we will then generalize our 
approach to the case where M is unknown. 



A. Known number of signals 

In this Section we consider the case in which the total 
number of signals present in the data set, K, is known, 
though in general we still assume that the noise spectral 
level 5o is unknown, and one of the parameters that we 
want to determine. 

Due to the large dimensionality of the problem, one is 
interested in the PDF of a given parameter or the joint 
PDF of two parameters, say p(Ai \{d a }) out of the (many) 
that constitute the unknown parameter vector; the PDF 
is obtained by integrating the joint PDF over the param- 
eters other than A^, to obtain what is often referred to 
as the marginalized PDF 

p(Ai\{d a }) = JdAi...J dA^ 

x dA i+1 . . . dA M p(A\{d a }) ■ (15) 



The likelihood C(d a \A) is given by ((25 



C(d a \A) oc exp 



mi (A) | s 



(16) 



in the complex domain; we denoted with subscript j 
the discrete nature of the data and consecutively model, 
with each data point separated from the other by a 
constant sampling frequency interval Af as found in 
Af = 1/T with T the sampling time of the data. 



B. Unknown number of signals 

When the number of signals present in the data set is 
not know - as it is the general case for LISA - the joint 
posterior PDF that one wants to compute is not simply 
restricted to the parameter vector A, but also includes 
K, the total number of GW signals present in the data 
set. It is clear that in this case the dimensionality of A 
depends on K (denoted by A(fc),fc e [0,K]), cf Eq. |ll])). 
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The PDF is therefore given by 

p{K)p{A{k)\K)C{{d a }\K,A{k)) 



p(A(k),K\{d a }) = 



P({d a }) 



and Eq. (14| becomes now 



£({d a }\K, A(k)) = n C(da\K, A(k)) . 



(17) 



(18) 



The joint posterior PDF, Eq. ( 17J) can be separated into 
a posterior PDF for the mode! 
rameter vector A(fc) 



indicator K and the pa- 



p(A(k),K\{d a }) = p(K\{d a })p(A(k)\K, {d a }) (19) 

and the same marginalized PDF may be applied for each 
individual posterior for the parameter vectors of each 
model: 

p{Ai\K,{d a }) = JdA 1 ...dA NP {K{k)\K,{d a }). (20) 



IV. AN ANALYSIS PIPELINE 

The goal of our analysis pipeline is to construct a 
Markov chain whose elements have a distribution ir(y) 
proportional to the posterior PDF (target distribution), 
with y denoting states (or members) of the chain pop- 
ulated by values of the parameter vector A and/or the 
model indicator K. Once this is done, a simple his- 
togram of the chain yields the (approximated) posterior 
distributions of interest. The distribution of the states 
approximate the target distribution if the chain is built 
time-homogenous, reversible and ergodic (thus aperiodic 
and positive recurrent) (hereafter denoted in summary 
"convergence criteria", see e.g. Haggstrom (26)). Time- 
homogenous states that the move of the random walk 
is determined by a single transition probability, or tran- 
sition kernel, between the old state and the new state 
solely. The condition of reservibility forces the distribu- 
tion 7r(y) to become stationary. Ergodicity enables the 
chain to return to the same state within a finite time, 
possible at every iteration of the chain progression. 

We construct a transition probability or kernel by in- 
troducing proposal density functions g y , y >{-) that control 
the transition from the current state y to the new state 
y' . The quality of proposal density functions can directly 
be measured in their performance in guiding the chain 
to fulfil the above convergence criteria in the shortest 
amount of time (called "burn-in" time of the sampler). 
The choice of g y , y '(-) is then clearly motivated by the 
problem at hand: as a consequence, for each class of 
signals on which one wants to apply the method some 
(usually non-negligible) tuning of the algorithm is nec- 
essary. The methods that we propose here are aimed at 
addressing this issue in an automated way accordingly. 



Our sampler uses two different kinds of proposal den- 
sities, both subject to adaptation schemes. The first pro- 
posal density suggests a transition from one state y to a 
new state y' with explicit dependence on the old state, 
i.e. 



V = y + 9y,y'(u>) 



(21) 



A random number u G J7[0, 1] selects the actual value 
from the proposal density with which the new state is 
formed. The width of the proposal distribution is an 
important factor in the performance of the sampling al- 
gorithm, and thus the subject of adaptation. If it is too 
great, proposals will fall mostly outside the mode of the 
underlying target distribution and be rejected; too small 
and the chain will not explore the full range of the mode 
efficiently. Such types of proposed jumps are here la- 
belled "dependent". Their advantages include ease of im- 
plementation, however there is evidently auto-correlation 
within the chain which must be eliminated by appro- 
priate thinning (keeping only every n-th member of the 
chain) . 

Alternatively, proposal densities can be constructed in- 
dependently of the old state, 



y =9y,y'{u). 



(22) 



In this case the proposal distribution is static and not 
relative to the old state. In order to probe adequately 
the whole posterior PDF, the proposal should envelop 
the PDF itself and, in order to achieve highest efficiency, 
it should mimic the posterior PDF as much as possi- 
ble, so that new states are more likely from the favoured 
regions of the PDF than from the low density areas of 
the PDF. To accurately match the posterior distribution 
would require detailed information of its structure and 
maxima, the very information we are trying to obtain, 
and is therefore impossible to specify in advance of ex- 
amining the PDF. This makes this kind of proposal dif- 
ficult to implement efficiently, since an over-conservative 
proposal with too broad a distribution will yield a low 
acceptance rate, and vice versa. The shape and loca- 
tion in parameter space of this proposal is thus the sub- 
ject of adaptation. Independent proposal distributions 
show their biggest advantage in yielding completely un- 
biased chains with no auto-correlation, and if successfully 
matched to the target distribution the high acceptance 
ratio will allow a fast and well-mixed exploration of the 
parameter space. 

The RJ-MCMC sampler that we implement is based on 
the AutoMix sampler created by Hastie (9) and works in 
two stages: 

MCMC preruns In the first stage the fixed dimension 
posterior PDFs for each proposed model are com- 
puted individually in turn. The goal of this first 
stage is to generate independent proposal densities 
that are necessary in the second and final transdi- 
mensional stage of the analysis according to a fit to 
found PDFs. We implement an MCMC algorithm 
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Figure 1: The flow chart of our RJMCMC sampler. For de- 
tails see text. 



based on a Random Walk Metropolis (RWM) sam- 
pler with multivariate Gaussian proposal distribu- 
tions. We adapt in the burn-in phase according to a 
modified Adaptive Acceptance Probability (AAP) 
technique, in its general form first introduced by 
Atchade and Rosenthal ([27)) . 

RJMCMC main run In the second stage the PDF for 
the model indicator K is additionally and simulta- 
neously estimated. The full posterior of the prob- 
lem is therefore generated. Model parameter PDF 
estimates are subject to AAP adaptation in the 
burn-in of the transdimensional RJ-MCMC chain, 
model indicator PDFs are subject to AAP adapta- 
tion after the chain burned-in. 

The flow chart in Fig. [I] summarizes the following 
propsed individual steps and the overall structure of the 
pipeline. 



A. Adaptation of "dependent" proposals: MCMC 
preruns 

Assume the present state of the chain, say its n— th 
element, is y. A new state y' (the n + 1 element of the 
chain) is proposed according to a proposal density prob- 
ability g yi y>(u); we indicate with g y > , y (u r ) the proposal 



density function for the inverse transition, from y' to y. 
The algorithm is based on the Metropolis acceptance ra- 
tio a to control the transition from the present to the 
new state with acceptance probability 



a 



y,y' 



mm 



(23) 



If a y,y' > 1 the n 
OL y ^y> < 1, the n H 



- 1 element of the chain becomes y' . If 
1 element is y r with probability ot y ^ y > . 
This is in practice achieved by comparing 7r(y f )/7r(y) to a 
random number drawn from a uniform distribution in the 
range C/[0, 1]: if this number is smaller than 7r(y') /ir(y), 
then the next element of the chain is indeed y'\ if this 
number is greater than 7r(y f ) /7r(y), than the n+l element 
of the chain remains y. g y , y '(-) is set to a multi-variate 
Gaussian distribution of rank M given by 



(n) _ 

9y,y' ~ 



where 



(27r) M det||(0- 



-1/2 _i 

e 



a 



Vi 

{n) 



A 



Sij 



(n) 



(v'i-Vi)[C%'\ (Vj-Vj) 

(24) 

(25) 
(26) 



The index n in Eq. (24) highlights that the proposals 
change, i.e. is "adapted", as the chain evolves using an 
AAP technique. The initial state of the chain n = is 
chosen randomly in parameter space. The mean in each 
dimension is set to the the actual value of the component 
of A, associated to the current state y, with A- n_0 ^ arbi- 
trary. The variances ^^ n- °^ for the first proposal are 

also set arbitrarily, and as the chain progresses are tuned 
automatically ("adapted") using an AAP technique, i.e. 
by dividing the initial value of the component of A by a 
given factor. 

AAP techniques provide a way of changing the pro- 
posal density function g y , y '(-) in an automatic way as 
chain evolves in order to maximize the efficiency of sam- 
pling from the target distribution. In this particular ap- 
plication g y , y >(-) is adapted by acting solely on the diag- 
onal elements of the matrix C^ n \ Therefore the proposal 
distribution for each parameter remains independent of 
the others. - uncorrelated proposals. The AAP algo- 
rithm uses a target acceptance probability r(<j/ c ) for op- 
timization, which is defined as 



-(n) 



dydy' a y , y > gfy • 



(27) 



This means that the average acceptance of new states 
should be given by r(cr^); in case of higher or lower ac- 
ceptance rate the sampler fine-tunes itself towards r(cr/ e ). 
The value for the target acceptance probability is set to 
r t =0.25, close to the asymptotically optimal acceptance 
probability for RWM sampler at 0.239 under specific con- 
ditions of the target ([28]) and rounded up to yield a con- 
servative measure for generic targets. 
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The way in which the variance in each dimension is 
updated follows the following scheme. If the chain makes 
the transition to y\ then at the n + 1 state the variance 
gets updated according to 

at +1) = max {o,^> - a<°> ^ (r t - 1) j , (28) 



otherwise the chain remains at state y and we set 

-) 4, (29) 



'1 \ 2 3 

(n+1) J n (n) (0) / 1 \ 

r- ' = max < 0, cr- — cr- 



We note, that this approach does not equal the original 
AAP algorithm from Atchade and Rosenthal (27), but 
states an alternative formalism first developed by Hastie 
(|9]) to accommodate not only Gaussian proposal densi- 
ties but also Student-T proposal densities, and to ensure 
convergence in a wider field of applications. The AAP 
algorithm is proven to be ergodic and time-homogenous 
Atchade and Rosenthal ([27|) ; Andrieu et al. (29), but 
it is questioned in its reversibility. Andrieu et al. ([29)) 
and Hastie ([9]) showed convergence of the AAP, thus re- 
versibility and following the fulfilment of convergence cri- 
teria for the underlying MCMC chain for a wide class 
of "well-behaved" target distributions, e.g. steady and 
continuos targets; we therefore feel it safe to apply the 
AAP to our motivated problem at hand. Nevertheless 
the word of caution of non-convergence has to be taken 
seriously, which leads us to apply the AAP for depen- 
dent proposal adaptations only in the burn-in stage of 
the sampler, which is discarded after the run finishes. 



B. Adaptation of "independent" proposals: the 
RJMCMC main run 

The RJ-stage implements dependent as well as in- 
dependent proposal densities in order to sample from 
the joint transdimensional PDF. Independent proposals 
are used solely within transdimensional moves (selecting 
a new model indicator if), while dependent proposals 
propagate the MCMC state within the current model K. 
Within transdimensional moves, we find two applications 
of independent proposal densities in our sampler. The 
first establishes a proposal for each dimension of the pa- 
rameter vector A(k),k G [0,if], based on the discoveries 
for the model specific PDFs from the prerun. This con- 
stitutes a single adaptive process outside the sampling 
algorithm. The possible choices of initial shape for the 
proposal that is adapted towards the recovered PDF is 
very large. For the purpose of the analysis of a large num- 
ber of Double White Dwarf binaries, a high dimensional 
problem, we found multivariate Gaussians with correla- 
tion within one signal, thus within one group of seven 
parameters that characterize a signal, but no correlation 
between signals to yield the only reliable and stable con- 
figuration at this stage of development. We therefore do 



not adapt to the full shape of the posterior, but restrict 
the method to adapting to the shape of the marginal 
posterior for each individual signal, including any corre- 
lations between its parameters, but not between distinct 
signals. 

The second adaptation scheme is found in the proposal 
for the model indicator K, which is subject to adapta- 
tion according to a different specification (RJ adapta- 
tion algorithm B in Hastie (9), page 12 Iff) of the general 
AAP scheme. We note upfront that ergodicity or conver- 
gence of this scheme has not been proven theoretically 
to full extent Hastie ©. Andrieu et al. ([29]) have shown 
that four conditions need to be served in order to guar- 
antee ergodicity of the adaptation algorithm. Three of 
these conditions (Al,A2,A4) have been met, and Hastie 
© established empirically that (A3) is met at least in 
well defined target distributions, as is the case for steady 
and continuos PDFs. We therefore value the benefits of 
the proposed scheme higher than the doubts. Proposed 
adaptation scheme targets the adaptation of the indepen- 
dent model selector proposal according to found posterior 
PDF. As discussed earlier, matching the shape of an in- 
dependent proposal density to the posterior of the prob- 
lem drastically improves the efficiency of the sampler, 
a desirable result for large multidimensional problems. 
Bringing the argument about efficiency further into play, 
we see it prohibitive to start a second prerun to establish 
empirically said shape, but rather adapt this quantity 
on-the-fly with ongoing RJ-MCMC sampling after the 
burn-in phase completed, and therefore the model selec- 
tor posterior is established/ visible. It is then crucial to 
value the "Markov" property of the chain, which states 
that only the immediate last state can be used to estab- 
lish the next state; which leads us to the details of the 
following diminishing adaptation AAP algorithm. 

We first rewrite the Metropolis-Hastings ratio in its 
trans-dimensional formalism, and denote by g yyy >(u) 
the independent proposal density for the model specific 
RWM parameter 



a 



y,y' 



mm 



1 ^W)9y'A u ') 
' n(y)9y,y'(v>) 



d(y',u') 



d{y,u) 



(30) 



States y and y' no longer need to have the same dimen- 
sion, with the difference in dimension accounted by the 
inclusion of the Jacobian determinant for a move from 
state y to state y' ([34]) . 

We introduce K models which are numbered with 
k (k G 1...K), and initially assign to each model the 
same probability (prob(/c) = l/K); the sampler tries to 
jump to each individual model with the same probability. 
After proposing a new model and after evaluation of the 
transdimensional Metropolis Hasting ratio, the model in- 
dicator probability for the new model (in case of accep- 
tance the proposed model selector value, otherwise the 
old) is adapted according to 



prob(/c) = prob(&) 




n 2/3 \ 

(l-prob(fc))J (31) 
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For all other remaining model indicators we perform 
prob(fc') = prob(fc') + (j^j ' (0 - prob(fc')) j , (32) 

with n stating the current number of MCMC state. 

In order to ensure that adaptation is performed accord- 
ing to the intrinsic posterior distribution of the model se- 
lector, we reset the proposal densities to a uniform spread 
each time one model is adapted towards a probability 
lower than prob(fc) < 1Qx ( 1 r+1 ) , with r the total number 
of resets; reset thus less often the longer we sample. 



C. Block updates 

A very important feature of our sampler is its ability 
to efficiently block update parameters of a given model. 
The posterior PDF of interest is unknown prior to the 
analysis; one suitable and safe way of probing its charac- 
teristics would therefore be to update one parameter of 
a model after the other, thus to carefully crawl through 
parameter space. However, since our motivation is found 
in LISA and the MLDC round 2, we see models propos- 
ing a vast number of Double White Dwarf signals. Each 
individual DWD signal has seven parameters; updating 
one parameter after the other in case of large number of 
signals is therefore unfeasible. On the other side, block 
updating all the parameters of a given model at once can 
lead to very low acceptance rates if there is a high degree 
of correlation between two or more parameters. In that 
case, a change in one parameter will result in a greatly 
reduced likelihood unless it is accompanied by a change 
in the other correlated parameters. A careful balance be- 
tween block updating and single parameter updates has 
thus to be found to make the sampler efficient, and ensure 
that convergence is reached within a feasible timescale. 

We borrow the concept of "exponential crossover" as 
used in genetic algorithms ([30]) to achieve such a mixed 
update strategy primarily in the first stage of the sam- 
pler. We group randomly parameters of a given model 
until a simultaneously drawn random number in U[0,1] is 
above a given threshold. In our case this threshold was 
set to 0.7. The probability to stop collecting parameters 
is therefore 30% for each draw. Resulting blocks typi- 
cally span 6 to 25 parameters for one update attempt if 
the number of signals is held large. The random charac- 
ter of this blocking ensures statistically unbiased combi- 
nations of parameters and optimal mixing of the chain. 
The above procedure is repeated until the total number 
of all grouped parameters so far equals or exceeds the 
total number of parameters in a given model; we then 
collect all the individual update results to build our new 
state in the Markov chain. The RJ-MCMC main run sets 
a threshold of 0.9, and therefore blocks more aggressively 
utilizing the progressively finetuned within-model RWM 
sampler. 



V. RESULTS 



In this section we demonstrate the analysis approach 
that we have described in the previous section, by apply- 
ing this technique on selected data sets as distributed in 
the context of the Mock LISA Data Challenges (MLDCs). 
In particular we apply the method on a single source data 
set in Gaussian and stationary noise (using the training 
data sets from MLDC-1) (hereafter Scenario A), a sin- 
gle source data set in which the noise is Gaussian and 
stationary plus non Gaussian contributions from a DWD 
galactic population (hereafter Scenario Bl) and addition- 
ally from binary black hole merger signals and extreme 
mass ratio inspirals (hereafter Scenario B2), with the 
model not accounting for the latter non Gaussian con- 
tributions; and to a data set with three (but we leave 
the total number of parameters as unknown) signals well 
separated in parameter space. These test examples al- 
low us to show the power of the approach and at the 
same time highlight some of the limitations in the cur- 
rent model implementation that have to be addressed in 
order to apply the method to data sets of much higher 
complexity (hereafter Scenario C). 

When using a Bayesian approach to data analysis, the 
end result is a full joint posterior probability distribu- 
tion over all parameters of the model. We wish to clarify 
the distinction between this kind of result which contains 
all the information which has been inferred about the 
model, and the more traditional "frequentist" approach, 
which typically quotes maximum likelihood values of the 
parameters along with a confidence interval. Neverthe- 
less, in order to test the robustness of the approach we 
calculate the 90% probability interval around the mode 
of the distribution in order to derive a counterpart to 
the frequentist 's confidence interval and test if the true 
parameter value lies within this regime. To accommo- 
date multimodal distributions we integrate the density 
along the probabilities starting from the highest prob- 
ability, the mode, towards lower probabilities until the 
90% mark is reached, not along the parameter value. In 
this sense it is possible to derive an interval that breaks 
up over the parameter regime with multiple start and end 
points. Although we use the mode to start integrating 
over the probability distribution, the mode in itself will 
not be used as information carrier or as counterpart to 
a frequentist 's maximum likelihood point. In particular, 
there is no intrinsic reason why the PDF on certain pa- 
rameters should follow a distribution which can be prop- 
erly expressed by a maximum likelihood point (within 
Gaussian distributions a maximum likelihood point may 
be safely derived). In general it would be inappropriate 
to simply give a maximum likelihood estimate or a mean 
as a result as this defeats the purpose of performing a 
Bayesian analysis, and would not encapsulate the infor- 
mation obtained from the data set. 
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A. Fixed model number: single source in Gaussian 
and stationary noise 

We start by considering the simplest data analysis case 
to demonstrate the performance of the algorithm that 
we have introduced here: the analysis of a data set con- 
taining a single DWD in Gaussian and stationary noise; 
we also assume to know a priori the number of signals 
present in the data set 

One challenge from the first round of the MLDC is to 
extract 20 "verification binaries" out of the data sets 1.1.2 
with total observation time of one year (Scenario A). Ver- 
ification binaries are known binary systems which have 
been observed electromagnetically, and are predicted to 
be clearly visible in LISA observations. Since these sys- 
tems are known beforehand, characteristics of the grav- 
itational wave emission can be deduced, a fact which 
should aid the extraction of these signals from the LISA 
data set. The MLDC released the frequency and the sky 
positions of 20 realistic binaries which lie within the sen- 
sitivity window of LISA. 

We perform a data analysis of these binaries in the 
frequency domain as already explained, therefore Fourier 
transforming the time series of the data set while cal- 
culating directly the Fourier series of the verification bi- 
nary signal. In this way we are a) able to cope with 
coloured noise levels as introduced in the MLDC data 
sets and, b) separate the signals of the 20 verification 
binaries since the frequencies of the gravitational wave 
emission are well separated and sparse over the entire 
observation window of LISA, c) decrease run times due 
to smaller data stretches to be evaluated compared to 
a time series analysis. Therefore, we are able to reduce 
this task to single-signal data analysis runs on a given 
verification binary and thus search for only one signal 
in a given frequency window. This scenario is very well 
suited to test the MCMC application of the first stage 
of the sampler, which includes adaptation of dependent 
proposal densities. For these reasons, this part of the 
analysis was performed without the Reversible Jump al- 
gorithm. 

Although frequencies and positions in the sky are 
given, we searched for all the seven parameters of the sig- 
nal. We use the information given about the frequency 
and sky position of the binaries solely as a starting point 
for our Markov chain. 

As we limit our analysis to verification binaries be- 
low or equal to 3 mHz, we found in AM CVn (Verifi- 
cation binary 1) our fiducial example. The parameters 
of the source are as follows: frequency fo = 1.944144 
mHz, ecliptic latitude d =0.65675 rad (parameter range 
0, 7r), ecliptic longitude (p =2.97249 rad (parameter range 
0,27r), scalar amplitude A =1.35705xl0 -22 , polarization 
phase ip =1.59740 rad (parameter range 0,7r/2), inclina- 
tion l =1.65742 rad (parameter range 0,7r) and initial 
phase $o =3.52547 (parameter range 0,27r). The pri- 
ors were chosen uniform and periodic over the domain of 
angles, and uniform and positive definite for the ampli- 



tude, So and /o- The initial values of our MCMC chain 
were chosen randomly to yield a match of more than 
90% with the true signal with the already named excep- 
tion of /, (p. We ran our MCMC up to one million 
Markov chain states, with an additional burn-in phase 
of ten thousand which was discarded after the run. The 
CPU run time was approximately 1 hour on the Tsunami 
cluster of the University of Birmingham, running on a 
single 2.4 GHz Intel Xeon CPU. 

Of particular interest is how adaptation affects the pro- 
posal densities and the quality of sampling. We present 
in Fig. [2] the acceptance rate per parameter over ongoing 
sampling time. It is visible, that within a few thousand 
iterations all individual acceptance rates but the noise 
level Sq are approaching the target of 0.25, as intended. 
The noise spectrum was found to be adapted quite slowly 
towards the acceptance so that even after 10 6 iterations 
the target is missed by 0.12. However we note that the 
target needs not to be reached penultimate; a proximity 
to the target is often sufficient to ensure secure sampling 
as we will see later in this section. In all other cases after 
reaching the target only minor corrections are performed, 
fading out the longer the run proceeds. We connect ac- 
ceptance rates with standard deviations (square root of 
the variance) of our Gaussian proposal densities to see 
how the width of the proposal gets adapted according to 
the target acceptance, as seen in Fig. [3| Once again the 
noise spectrum estimate is still converging towards a final 
estimate, but was found to already yield the true poste- 
rior density as seen later. For the other parameters it is 
visible that development of particular widths are often 
not clearly connected to acceptance rates, as e.g. in the 
polarization phase were we see acceptance rates to show 
an oscillating behaviour with clear decreasing stages and 
increasing stages during the first few iterations and the 
last few thousand. 

Marginalized posterior PDFs are presented in Fig. [4] 
We find for this particular challenge that the posterior 
densities follow Gaussian distributions. This is true in 
general for all the MCMC data analysis runs we per- 
formed on well separated verification binary signals (for 
further examples see e.g. Stroeer et al. ([TOjO . A pecu- 
liarity is the bimodal distribution for <£> which shows its 
modes separated by exactly In. This hints to a probable 
degeneracy in the polarisation angle modulo tt/2 within 
the LISA response function, a finding which is going to 
be investigated in future research. We note that <£>o does 
not carry any physical information as it is only the offset 
in time of the waveform at the beginning of the observa- 
tion. We therefore continue to discuss the remaining dis- 
tributions and its relevance to data analysis approaches 
without further considering this peculiarity. One partic- 
ular systematic feature visible in Gaussian marginalized 
posterior densities is a given offset between the mode or 
the most likely value of the PDF and the true value. We 
note that this offset is intrinsic to the method we use 
to sample from the posterior. Sampling techniques such 
as Metropolis sampling or Metropolis-Hastings sampling 
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Figure 2: Scenario A: Adaptation statistics for verification 
binary AM CVn (number 1); acceptance rates. We find con- 
vergence towards the desired acceptance rate of 0.25 in all 
the parameters but the noise level So, with the latter ongoing 
adaptation still yielding a reliable sampling of the posterior 
as seen e.g. in Tab. [I] 



Figure 3: Scenario A: Adaptation statistics for verification 
binary AM CVn (number 1); standard deviations. We see 
the standard deviations to converge towards a single optimal 
value in all the parameters but the noise level So. Neverthe- 
less, even with varying standard deviations in the noise level 
we sample robustly from the posterior. 



will return only approximates to the PDF, never the true 
PDF. The largest influence on the bias is found in sam- 
pling lengths, since convergence to the intrinsic PDFs is 
found to be asymptotically in time; sampling would thus 
be optimal for an infinitely long sampling run, but since 
we have to stop the code at some point bias is intro- 
duced (f3T|) . Current adaptive methods therefore target 
in general to speed up the asymptotic convergence. Fur- 
thermore the noise level is sufficiently large to confuse the 
likelihood, and is found responsible to add to the bias ([5]). 

We show in Tab. [I] the 90% probability interval for 
the marginalized posterior distribution and compare it 
to the true parameter value. We find the true parameter 
to always lie within the 90% probability interval. The 
bimodal distribution in <3> yields a probability interval 
that is broken into two parts. We therefore give two start 
points and two end points for the actual 90% interval and 
note the true value to lie within the second part of the 
interval. 





90 % probability interval 


injected value 


A [xlO- 22 ] 


1.119747 


1.48167 


1.357046 


f [mHz] 


1.944142 


1.944146 


1.9441457 


if [rad] 


2.935855 


3.088559 


2.972493 


& [rad] 


0.630025 


0.733202 


0.656745 


^ [rad] 


1.481676 


0.091941 


1.597401 


l [rad] 


1.614521 


1.753378 


1.657422 


$o [rad] 


0.212327,3.249962 


0.536263,3.701231 


3.525472 


So [xlO" 41 ] 


1.32436 


1.35161 


1.345472 



Table I: We show the 90% probability interval for the 
marginalized posterior distribution and the true parameter 
value for the DWD in the MCMC on scenario A. We find 
the true parameter to always lie within the 90% probability 
interval. The bimodal distribution in <3>o yields a probability 
interval that is broken into two parts. We therefore give two 
start points and two end points for the actual 90% interval 
and note the true value to lie within the second part of the 
interval. 



B. Verification binaries from the MLDC Round 2 

Round 2 of the MLDC changed the complexity of the 
data sets. The new observation time was set to 2 years, 
and a synthetic galactic population of compact object bi- 
naries was added to create non-Gaussian stochastic fore- 
ground (Scenario Bl). Challenge 2.2 in addition added 



4 to 6 massive black hole binaries and 5 extreme mass 
ratio inspiral (EMRI) signals (Scenario B2). Since these 
scenarios are considered realistic for LISA, it serves as an 
optimal playground to challenge the MCMC routine with 
adaptation of dependent proposal densities as demon- 
strated in the preceding section. The MLDC released 25 
frequencies and sky positions for stronger signals within 
the galactic population spread over the entire frequency 
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Figure 4: Scenario A: Marginalised PDFs for verification bi- 
nary AM CVn (number 1). Red lines denote the true value of 
the parameter. We observe Gaussian distributions in all the 
parameters with the peculiarity of a bimodal distribution in 
$o indicating a possible degeneracy of ^ modulo tt/2 in the 
LISA response function. We observe the true value always to 
be covered by the recovered marginal posterior distributions, 
a result that will be elaborated in Tab. [J 



band, which gives the 25 verification binaries to search 
for. 

Once again we limited our analysis below or equal to 3 
mHz; however we did not use the given set of Verification 
binaries to guide our fiducial source. We hand-picked a 
very strong DWD source with no strong neighbours as 
seen in the frequency domain from the galactic simula- 
tion key to reduce confusion (the strongest sources in the 
neighbourhood are a factor of ~ 7 smaller in amplitude) . 
The parameters of the source are as follows: frequency 
fo =0.87307 mHz, ecliptic latitude i? =-1.14579 rad (pa- 
rameter range — 7r/2,7r/2 as the second MLDC changed 
conventions in the coordinate system), ecliptic longitude 
ip =1.97821 rad (parameter range 0,27r), scalar ampli- 
tude A =8.43915 xlO" 22 , polarization phase i/j =0.39204 
rad (parameter range 0,7r/2), inclination i =0.43144 rad 
(parameter range 0,7r) and initial phase $0 =1.64716 
(parameter range 0,27r). Prior and MCMC setup were 
chosen as shown in the preceding section, with run time 
on a single 2.4 GHz Intel Xeon CPU of the Tsunami clus- 
ter now 2 hours. It was our expectation that stochastic 
foreground in this data set can be approximated as just 
an elevated constant noise level in the frequency win- 
dow of interest, therefore effectively reducing the signal 
to noise ratio of the signal as measured on instrumental 
noise alone. 

Fig. [5] compares adaptation as performed on Scenario 



Bl and Scenario B2, here showing acceptance rates. We 
see, that the MCMC on Scenario B2 does not yield 
largely different characteristics on adaptation than on 
Scenario Bl - adaptation is triggered by the character- 
istics of the source we aim to extract, not by the level 
of the noise. Thus in both cases adaptation was success- 
ful in every parameter, with the only exemption in Sq. 
Here we see the standard deviations of the noise level 
essentially underestimated all the time. This leads to al- 
most always accepted states as the chain does not move 
strongly enough. Nevertheless the trend slowly points 
toward the optimal acceptance, therefore does not show 
a runaway effect but a very slow convergence, a condi- 
tion sufficient for stable sampling of the posterior. Fig. [6] 
shows the standard deviations of our proposal densities. 
The stochastic foreground seems to add significantly to 
the instrumental noise level, a starting value close to the 
instrumental noise level has to be adapted continuously 
towards a higher level to account for this confusion. 

We present in Fig. [7] the marginalized posterior den- 
sities as recovered by our adaptive MCMC scheme for 
challenge training data set Scenario Bl and Scenario B2. 
We see all posteriors to cover the true value of the pa- 
rameter with the mode of the distribution close by. Fur- 
ther we see complex asymmetric shapes of the posterior - 
demonstrating mode and median of this posterior to state 
now insufficient measures of quantity and quality of the 
run. We see the recovered noise level Gaussian but too 
high due to confusion accounted towards the power of the 
noise, as expected (it should be of order 5 x 10~40). This 
is aggravated in Scenario B2 where we see an even higher 
level of noise due to additional contributions from mas- 
sive black hole binaries and EMRIs. Ongoing the widths 
of PDFs from Scenario B2 are wider than from Scenario 
Bl, mirroring an additional uncertainty as introduced 
by added MBHBs and EMRIs - the SNR of the signal 
was effectively lowered as even more power was added to 
the instrumental noise as confusion from EMRI and MB- 
HBs. The widening of the PDFs is better expressed in 
Tab. [il] were we quote the 90% probability interval of the 
marginalized posterior distributions for each individual 
MLDC. We highlight that once again every true param- 
eter value of the source lies within the 90% probability 
interval. 

We highlight, that an extraction of the source 
was successful without prior analysis and/or extrac- 
tion/subtraction of the massive black hole and the EMRI. 
We were able to directly apply our adaptive approach to 
this difficult data set. We further note that the marginal- 
ized posterior distributions for the amplitude never dis- 
play the possibility of a zero amplitude, thus the pos- 
sibility that the signal may not be in the data. This 
may be seen as the verification that we also unambigu- 
ously confirmed a detection of the signal in the data set. 
Nevertheless, the sub-optimal adaptation statistics of the 
noise level could indicate that we might have assigned 
background noise power to the signal. Though this is 
clearly not the case for the given specific examples we 



12 





90 % probability interval 


injected value 


MLDC Scenario Bl 


A [xl(T 22 ] 


7.52346 


11.9826 


8.439146 


f [mHz] 


0.8730691 


0.8730704 


0.87307 


ip [rad] 


1.92235 


2.0705 


1.978207 


& [rad] 


-1.21092 


-1.0920 


-1.145790 


^ [rad] 


0.099671 


1.47502 


0.392043 


l [rad] 


0.0516501 


0.917192 


0.431438 


$o [rad] 


5.72112 


2.52754 


1.647163 


MLDC Scenario B2 


A [xl(T 22 ] 


7.61029 


14.6392 


8.439146 


f [mHz] 


0.8730688 


0.8730709 


0.87307 


if [rad] 


1.90087 


2.13798 


1.978207 


# [rad] 


-1.20322 


-1.05211 


-1.145790 


^ [rad] 


0.0986202 


1.4652 


0.392043 


l [rad] 


0.0675063 


1.0741 


0.431438 


$o [rad] 


5.61281 


2.63971 


1.647163 



Table II: We show the 90% probability interval for the 
marginalized posterior distribution and the true parameter 
value for the DWD in the MCMC on MLDC Scenario Bl and 
MLDC Scenario B2. We find the true parameter to always lie 
within the 90% probability interval. The true noise level is 
unknown as it contains contributions from the unknown galac- 
tic DWD population (Scenario Bl, Scenario B2), BH mergers 
(Scenario B2) and EMRIs (Scenario B2) 



cannot generalize our results to other data analysis situ- 
ations without ongoing research beyond the scope of this 
paper. Additionally to prove a detection, the false alarm 
rate of our approach would have to be established, in 
particular how often do we extract a marginal posterior 
density in the amplitude that does not contain if we 
vary our template over the position and the frequency of 
a similar DWD. Again this is beyond the scope of this 
paper. 



C. Reversible jumps on verification binaries of the 
MLDC1 



In order to test the RJ-MCMC algorithm we turn our 
attention to a self-made problem. The verification bi- 
nary training data set IB. 1.2 (Scenario 3) contains three 
in frequency well separated from each other verification 
binaries in the frequency window 1.8 mHz to 2.1 mHz, 
well sepa rated as well from the remaining binaries (See 
Tab. Ill for parameter values). The lack of overlap thus 



presents a rather clean situation for any RJ-MCMC ap- 
proach; the correct amount of signals should be easily 
recovered when one restricts the run to this window and 
searches for more than 3 signals, say 8 signals in this 
example - and should thus show and demonstrate the 
RJ-MCMC algorithm best since the outcome should be 
trivial and understood. We propose 8 individual models 



to be compared to the data set, each model increasing 
the total number of signals by one, respectively. We ex- 
pect the likelihood of model 3, the model containing the 
correct number of signals, to be highest in value, with the 
RJ-MCMC almost solely concentrating on this model and 
only testing different models in turn with low probabili- 
ties to actually jump to these other models. 

The proposal densities for the RJ-MCMC stage of the 
sampler is constructed from two major blocks; on the 
one side multi-dimensional Gaussian kernels variable in 
parameter space propose movements within each individ- 
ual model. We therefore count 8 different multidimen- 
sional proposals, one for each model, with dimensional- 
ity increasing from model number to model number. We 
restrict the inner structure of the Gaussian to be block di- 
agonal in the variance-covariance matrix, with full struc- 
ture within each individual proposed signal and no cor- 
relation in between individual signals. Since the actual 
entries to the variance-covariance matrix are unknown to 
begin with the sampler performs MCMC pre-runs to es- 
tablish estimates on the base of the experienced structure 
of the posterior, here for each model in turn. We start the 
MCMC chains close to the values of the injected signals, 
here randomly chosen to yield more than 90% overlap 
with the original signal, and completely randomly for sig- 
nals 4 to 8 within the boundaries of our priors which are 
unrestricted except for the frequency, which is bound to 
the frequency window of interest. We perform 11000 it- 
erations, 1000 for the burn-in of the MCMC stage, 10000 
for the actual collection of information to the posterior, 
here storing the actual values of the state at every 10th 
iteration (thinning out to reduce auto-correlation of the 
chain) in order to estimate a posteriori the structure of 
the Gaussian kernels with correlation estimates in be- 
tween parameters per signal and standard deviation esti- 
mates per parameter. We are aware that 10000 iterations 
may be seen too restricted to many readers - the MCMC 
run is definitely too short to show the full structure of the 
posterior. Nevertheless, since we only want to estimate 
the structure of proposal densities for the second stage, 
we find this amount of iterations very well suited to give 
the spread of the marginal posteriors and to uncover cor- 
relations. Updates in the MCMC stage and within the 
RJ-MCMC stage are performed blockwise, however we 
set the blocking probability low in order to move care- 
fully through the parameter space, with a 30% chance 
of stopping the blocking at each random addition to the 
block. 

The RJ-MCMC stage samples 10 6 MCMC states trans- 
dimensionally to recover the posterior densities under 
considerations. Our RJ-MCMC undoubtedly identified 
model 3 to be the best matching as expected, with 
99.8905 percent probability to match the data. The re- 
maining models show 0.026 percent for model 1, 0.061 
for model 2, 0.022 for model 4, 0.0005 for model 5 and 
percent for the remaining models (below minimum accu- 
racy). The reason behind this very distinct result is found 
in the choice of the problem at hand. Given our data set 
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Figure 5: Scenario Bl (left panel) and Scenario B2 (right panel). Adaptation statistics for the hand-picked verification binary; 
acceptance rates. We find convergence towards the desired acceptance rate of 0.25 in all the parameters but the noise level 
So, where we see the standard deviation of the noise level essentially underestimated all the time. This leads to almost always 
accepted states as the chain does not move strongly enough. Nevertheless the trend slowly points toward the optimal acceptance, 
therefore does not show a runaway effect but a very slow convergence. This yields a stable sampling, as seen e.g. in Tab. [I] 
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Figure 6: Scenario Bl (left panel) and Scenario B2 (right panel). Adaptation statistics for the hand-picked verification binary; 
standard deviations. We see the standard deviations to converge towards a single optimal value in all the parameters but the 
noise level So. Nevertheless, even with strongly varying standard deviations in the noise level we sample robustly from the 
posterior. 
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Figure 7: Scenario Bl (left panel) and Scenario B2 (right panel). Marginalised PDFs for the hand-picked verification binary. 
Red lines denote the true values of the parameter. The true noise level is unknown as it contains contributions from the 
unknown galactic DWD population (Scenario Bl, Scenario B2), BH mergers (Scenario B2) and EMRIs (Scenario B2) 
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displaying three very strong sources well separated in fre- 
quency space with no stochastic foreground added it is 
obvious that models other than proposing three signals 
will have almost no likelihood to match the data. Any dif- 
ferent outcome than the distinct above would have posed 
a problem to the sampling algorithm. 

The RJ adaptation can be found to very quickly adapt 
in favor of model 3 as dictated by the experienced model 
selector posterior, as seen in Fig. [9] The probabilities for 
each model were set equivalent at the beginning of the 
RJ-MCMC main sampling stage (after 10.000 iterations 
burn-in), and are adapted towards the experienced PDF 
of the model indicator. We see this adaptation to reset 26 
times, a failsafe to ensure that adaptation is performed 
on the intrinsic PDF of interest and not towards a model 
in which the RJ sampler got stuck (e.g. local maxima of 
the likelihood in a given model posing a too high thresh 
to overcome to jump to different models, or other models 
not converged so far yielding too low likelihoods to jump 
to). After 26 resets ( 105.000 iterations) the sampler 
no longer experiences a runaway adaptation but remains 
in equilibrium. We see a final adaptation towards the 
posterior PDF and no more resets afterwards. 

The marginal posterior densities for model 3 are sh own 
in Fig [8] with corresponding 90% intervals in Tab. Ill 



As we already stated in the introduction to this result 
section, the intrinsic result of any (RJ)MCMC sampler 
is a joint posterior density. Marginalization is a post- 
processing routine that tries to untangle the parameters 
from the joint posterior. Shown marginal posterior den- 
sities now display two outcomes. On the one side every 
90% probability density covers the true parameter except 
for two $o distributions, one slightly off (a 95% proba- 
bility interval would include the true value) and one that 
peaks at lir offset compared to the true value. Here we 
clearly see once again in the latter case a possible degen- 
eracy in \I/ modulo tt/2 in the LISA response function. 
As $o does not carry any physical information we do not 
concentrate further on these distributions. We therefore 
assess the overall match of 90% interval with injected 
value as proof of the robustness of the code. On the other 
side we find posterior densities deformed within two sig- 
nals and two parameters A, i (the remaining parameter 
shows clean Gaussian-like posterior densities). We see 
the amplitude tailing towards larger amplitudes, in one 
instance even forming a secondary maxima, in combina- 
tion with skewed distributions for the inclination with 
its maximum towards larger values. It is not fully un- 
derstood why these deformations took place, however it 
has to be noted that the inclination angle determines the 
contribution of h+ and h x to the final detected strain 
(and thus amplitude), with 50% weighting in case of a 
system at which we look face on, and full weighting for 
h+ if we look at the system edge on. The weighting of 
/i +?x is mirrored in the detector response function, and 
it seems plausible that a degeneracy between two signals 
developed in the detector response of each signal as trig- 
gered by complimentary A and i values. This suspicion is 
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4.8852 
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DWD 3 
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2.005901 


2.0059 
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2.39401 
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$0 [rad] 
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4.37626 


Noise level So 


So [xlO" 41 ] 


1.10773 


5.88374 


1.345472 



Table III: We show the 90% probability interval for the 
marginalized posterior distribution and the true parameter 
value for the sources in the transdimensional RJ-MCMC on 
MLDC Scenario C, here as seen in model 3 . We find the true 
parameter to always lie within the 90% probability interval 
except for two <E>o distributions. 



further fuelled by noting that the true injected values for 
l in those two signals are almost l7r apart. We therefore 
might look at a degeneracy in i modulo tt in the LISA re- 
sponse function in case of multiple signals within a joint 
posterior. As the investigation of this effect is of technical 
nature to the LISA response function, and thus beyond 
the scope of this paper, we refer the reader to future re- 
search. We also note that the posterior for the noise level 
is much wider as found for a run on only one signal at 
the same frequency range, see e.g. Tab [TTT] compared to 
Tab.[TJ This is largely due to the fact that the noise level 
is not exactly constant over the given frequency range of 
1.8 mHz to 2.1 mHz, it shows a slight variation which 
we only approximated to be constant. For the other the 
skewness of A and i might have lead to a slight tailing to- 
wards larger values, but as before this cannot be proven 
without further research outside the scope of this paper. 
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Figure 8: Scenario C: The marginalized posterior densities for our RJ-MCMC example for the correctly identified model to 
contain the actual amount of signals in the data, here model 3 with 3 signals. We show the posteriors of the signals in order 
from top to bottom. The noise level So estimated besides the three signals is shown in the bottom right panel. Red lines denote 
the true value of the parameter. 
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Figure 9: Scenario C: The evolution of the RJ-MCMC model 
indicator proposal density for three signals. The probabilities 
for each model were set equivalent at the beginning of the RJ- 
MCMC main sampling stage (after 10.000 iterations burn-in), 
and are adapted towards the experienced PDF of the model 
indicator. We see this adaptation to reset 26 times, a failsafe 
to ensure that adaptation is performed on the intrinsic PDF of 
interest and not towards a model in which the RJ sampler got 
stuck. After 26 resets ( 105.000 iterations) the sampler reaches 
the equilibrium distribution and the adaptation towards the 
posterior PDF finalizes. 



VI. DISCUSSION 

Using Bayesian inference, we were able to calculate 
without manual finetuning the posterior probability dis- 
tribution of DWD signals in noisy data adaptively with 
an RJ-MCMC method. Our approach, based on a Ran- 
dom Walk Metropolis sampling algorithm, adapted ac- 
cording to a modified Adaptive Acceptance Probability 
technique was found to yield reliable results on three data 
analysis challenges: a) the recovery of verification bina- 
ries from TDI variables as found in MLDC Scenario A; b) 
the recovery of verification binaries from TDI variables 
as found in MLDC Scenario Bl and Scenario B2 with 
additional contributing background signals from a 1 mil- 
lion strong galactic population of double white dwarfs 
(Scenario Bl) and additionally added black hole merger 
signals and EMRIs (Scenario B2), c) the determination 
of the amount of verification binaries in a frequency snip- 
pet of Scenario C and the recovery of their astrophysical 
parameters. We note in particular the successful extrac- 
tion of the signal in Scenario B2. as here no attempt 
to identify and remove nuisance signals of mergers and 
EMRIS was included, but a fit-all-at-once-while-ignoring- 
t he-nuisance strategy was chosen. 

We found that our RJ-MCMC approach performs reli- 
ably. Reliability may be measured for once by its degree 
of offset or bias of the 90% integrated probability interval 
of the marginalized posterior distribution per parameter 
to the true parameter value. Scenario A found this off- 
set to be small and reasonable, we always find the true 



parameter value within the smallest interval to contain 
90% of MCMC states, and furthermore within one to 
two standard deviations as derived from the sample of 
the posterior. Scenario Bl finds this offset or bias aggra- 
vated, since power of the confusion background is placed 
within the spread of our verification binary signal in fre- 
quency. The sampler indeed gets confused. Nevertheless, 
setting the noise level as an additional unknown serves as 
a stabilising element of our MCMC, all but two MCMC 
chains were able to converge meaningful to the signal. 
RJ-MCMCs on Scenario C found the offset similar to 
MCMCs on Scenario A with additional widening, possi- 
bly due to degeneracies within the LISA response func- 
tion (<E>o to \£) and within signals (A to l). Adaptation 
within the MCMC stages was found to be determinis- 
tic and reliable, with the exception of the noise level, 
which was found to evolve too slowly. Our RJ-MCMC 
adaptation approach was proven to perform reliably and 
quickly, here mainly visible in the speedy and targeted 
adaptation towards the correct model indicator posterior 
after the burn-in of the sampler, securely recovering and 
progressively converging after forced resets of model in- 
dicator priors. 

We note, that described adaptation approach may not 
be the most sophisticated approach in adaptive sampling. 
However, it has the benefit of being easy to implement 
and easy to understand and control. Run times may be 
found slightly higher than in other non-adaptive schemes, 
since every proposed parameter is updated in turn to 
build a new Markov chain state. Nevertheless, we con- 
sider this loss of speed reasonable compared to the gain of 
our approach: a fully automatic sampler that performs 
simultaneously without ad-hoc assumptions or manual 
fine-tuning. We therefore see significant potential in our 
approach to build an automatic end-to-end pipeline for 
gravitational wave data analysis. As was shown in a dif- 
ferent publication of the authors (QjJ, this sampler may 
be combined with incoherent or coherent preruns to shed 
better light on some parameters, allowing the MCMC to 
converge faster onto the posterior distribution. 

Concluding, we revisit two major restrictions of the 
RJMCMC in this paper. First we implemented the RJM- 
CMC only over a small number of signals, which ques- 
tions the scalability of the algorithm. We note that the 
algorithm is in principal scalable to realistic numbers of 
white dwarf signals (tens of thousands) with no major 
modifications if computing resources are available. How- 
ever, the pre-run stage needs to explore the posterior 
density function of each proposed model in turn, and as 
each proposed model in turn adds one signal to the mix- 
ture of already proposed signals we find computing efforts 
to scale in the pre-run stage according to Y^k=o ^- We 
therefore have to simplify this stage to render the code 
feasible for the upcoming LISA data challenges. As the 
pre-run only serves to estimate the shape and structure 
of posterior density function of each proposed model, we 
find it plausible to introduce theoretical approximations 
that replace empirical simulations at the cost of adap- 
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tivity. Nevertheless, considering the unknown factor in 
the data analysis of GWs, adapting to the unknown with 
the help of empirical simulations has to be performed at 
least once to set the stage of thoretical calculations, and 
for this case future must show how one can speed up the 
prerun adaptation. Second we required the signals to be 
well separated in frequency. This separation was only 
introduced because of experienced coupling of parame- 
ters, like the amplitude/inclination coupling, obscurring 
resolved posterior density functions of e.g. neighbour- 
ing frequency waveforms in such a way that a success of 
the RJMCMC technique may be falsly questioned by the 
odd shape of the PDF. As the main goal of this paper is 
the introduction and demonstration of RJMCMC tech- 
niques, and as found effects are solely introduced by a 



degenerate description of the LISA response function, we 
find demonstrated examples truthfully representing the 
ability of the RJMCMC sampler, and refer the reader 
for a more complex study of the LISA problem to future 
publications. 
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also used a prime to identify the TDI variables used in 
the analysis to distinguish them from the TDI A, E and 
T that are usually described in the literature and are 
constructed not from X, Y and Z, but from a, /3 and 

7 <E3 

[34] It has to be noted that the Jacobian is essential to the 
acceptance ratio definition, not introduced for the trans- 
dimensional problem. In fact the Jacobian is formally 
present in the fixed dimension definition of the accep- 
tance ratio Eq. [23] but cancels to 1 all the time. In 
the case of certain trans-dimensional moves it no longer 
equals unity. 



